AITopics | restless bandit

Collaborating Authors

restless bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Weakly Coupled Deep Q-Networks

Neural Information Processing SystemsFeb-15-2026, 17:43:42 GMT

"subagents," one for each subproblem, and then combine their solutions to establish

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Automobiles & Trucks (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Young Hun Jung, Ambuj Tewari

Neural Information Processing SystemsFeb-11-2026, 19:38:28 GMT

Restless bandit problems are instances of non-stationary multi-armed bandits.

artificial intelligence, bandit, data mining, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence (0.71)
Information Technology > Data Science > Data Mining > Big Data (0.55)

Add feedback

Learning Infinite-Horizon Average-Reward Restless Multi-Action Banditsvia Index Awareness

Neural Information Processing SystemsFeb-9-2026, 18:36:32 GMT

data mining, machine learning, restless bandit, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
Asia > India (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness

Neural Information Processing SystemsDec-24-2025, 10:53:33 GMT

We consider the online restless bandits with average-reward and multiple actions, where the state of each arm evolves according to a Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. Since finding the optimal control is typically intractable for restless bandits, existing learning algorithms are often computationally expensive or with a regret bound that is exponential in the number of arms and states. In this paper, we advocate \textit{index-aware reinforcement learning} (RL) solutions to design RL algorithms operating on a much smaller dimensional subspace by exploiting the inherent structure in restless bandits. Specifically, we first propose novel index policies to address dimensionality concerns, which are provably optimal. We then leverage the indices to develop two low-complexity index-aware RL algorithms, namely, (i) GM-R2MAB, which has access to a generative model; and (ii) UC-R2MAB, which learns the model using an upper confidence style online exploitation method. We prove that both algorithms achieve a sub-linear regret that is only polynomial in the number of arms and states. A key differentiator between our algorithms and existing ones stems from the fact that our RL algorithms contain a novel exploitation that leverages our proposed provably optimal index policies for decision-makings.

algorithm, infinite-horizon average-reward restless multi-action bandit, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

Neural Information Processing SystemsDec-23-2025, 17:32:48 GMT

Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.39)

Add feedback

8912b4892064a4f08a0c04f92913c134-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 00:40:43 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Automobiles & Trucks (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Young Hun Jung, Ambuj Tewari

Neural Information Processing SystemsOct-2-2025, 11:24:06 GMT

Neural Information Processing Systems http://nips.cc/

bandit, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.65)

Add feedback

When are Kalman-Filter Restless Bandits Indexable?

Christopher R. Dance, Tomi Silander

Neural Information Processing SystemsOct-2-2025, 08:37:10 GMT

We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.

mechanical word, palindrome, restless bandit, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

71f003060ce1e8b6b4856023b67cda5d-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 20:02:07 GMT

data mining, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > India (0.04)
North America > United States > New York > Broome County > Binghamton (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre: Research Report (0.68)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee

Hung, Yu-Heng, Hsieh, Ping-Chun, Wang, Kai

arXiv.org Artificial IntelligenceAug-15-2025

Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider $N$-armd RMAB with non-stationary transition constrained by bounded variation budgets $B$. Our proposed \rmab\; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that \rmab\; achieves $\widetilde{\mathcal{O}}(N^2 B^{\frac{1}{4}} T^{\frac{3}{4}})$ regret bound by leveraging a relaxed definition of regret, providing a foundational theoretical framework for non-stationary RMAB problems for the first time.

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2508.10804

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback